Efficient Online Bandit Multiclass Learning with Õ(√T) Regret
نویسندگان
چکیده
We present an efficient second-order algorithm with Õ( 1 η √ T )1 regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η = 0) to squared hinge loss (η = 1). This provides a solution to the open problem of (Abernethy, J. and Rakhlin, A. An efficient bandit algorithm for √ T -regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.
منابع مشابه
Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret
We present an efficient second-order algorithm with Õ( 1 η √ T ) regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η = 0) to squared hinge loss (η = 1). This provides a solution to the ...
متن کاملNewtron: an Efficient Bandit algorithm for Online Multiclass Prediction
We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of NEWTRON is O(log T ) when α is a constant that does not vary with horizon T , and at mostO(T ) if α is allowed to increase t...
متن کاملCompeting in the Dark: An Efficient Algorithm for Bandit Linear Optimization
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret. The setting is a natural generalization of the nonstochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by...
متن کاملAn Efficient Algorithm for Bandit Linear Optimization
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret. The setting is a natural generalization of the non-stochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered b...
متن کاملStochastic Online Greedy Learning with Semi-bandit Feedbacks
The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and -quasi greedy regret as learning metrics comparing with the performance of offline greedy al...
متن کامل